Analysis of biclusters with applications to gene expression data †

نویسندگان

  • Gahyun Park
  • Wojciech Szpankowski
چکیده

For a given matrix of size n ×m over a finite alphabet A, a bicluster is a submatrix composed of selected columns and rows satisfying a certain property. In microarrays analysis one searches for largest biclusters in which selected rows constitute the same string (pattern); in another formulation of the problem one tries to find a maximally dense submatrix. In a conceptually similar problem, namely the bipartite clique problem on graphs, one looks for the largest binary submatrix with all ‘1’. In this paper, we assume that the original matrix is generated by a memoryless source over a finite alphabetA. We first consider the case where the selected biclusters are square submatrices and prove that with high probability (whp) the largest (square) bicluster having the same row-pattern is of size log2Q nm where Q −1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constrained Subspace Clustering for Time Series Gene Expression Data

For time series gene expression data, it is an important problem to find subgroups of genes with similar expression pattern in a consecutive time window. In this paper, we extend a fuzzy c-means clustering algorithm to construct two models to detect biclusters respectively, i.e., constant value biclusters and similarity-based biclusters whose gene expression profiles are similar within consecut...

متن کامل

Biclustering Analysis of Coregulated Biclusters from Gene Expression Data

In this paper, the Biclustering analysis of coregulated biclusters from gene expression data is carried out. Gene expression is the process, which produces functional product from the gene information. Data mining is used to find relevant and useful information from databases. Clustering groups the genes according to the given conditions. Biclustering algorithms belong to a distinct class of cl...

متن کامل

Evaluation of Plaid Models in Biclustering of Gene Expression Data

Background. Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now. Objective. The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets. Metho...

متن کامل

Ccc-bicluster Analysis for Time Series Gene Expression Data

Many of the biclustering problems have been shown to be NP-complete. However, when they are interested in identify biclusters in time series expression data, it can limit the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a well-mannered problem. Its motivation is the fact that biological processes start and conclude in an identifiable contiguous p...

متن کامل

A novel biclustering approach with iterative optimization to analyze gene expression data

OBJECTIVE With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunctional genes and searching for co-expressed genes under a few specific conditions; that is, a subgroup of all conditions. Biclustering based on a genetic algorithm (GA) has shown better...

متن کامل

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Biclustering Models

Query driven Biclustering Model refers to the problem of extracting biclusters based on a query gene or query condition. The extracted biclusters consist of a set of genes and a subset of conditions that are similar to the query gene or query condition and it includes the query input also. Two approaches applied for biclustering problems are topdown and bottom-up, based on how they tackle the p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005